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Abstract. Coding theory has played a central role in the development of computer science. One 
critical point of interaction is decoding error-correcting codes. First- and second-order Reed-Muller 
(RM(1) and RM(2), respectively) codes are two fundamental error-correcting codes which arise in 
communication as well as in probabilistically-checkable proofs and learning. In this paper, we take 
the first steps toward extending the quick randomized decoding tools of RM(1) into the realm of 
quadratic binary and, equivalently, Z4 codes. Our main algorithmic result is an extension of the 
RM(1) techniq ues from Goldreich- Levin and Kushilevitz-Mansour algorithms GL89 KM91 to the 
Hankel code |CGL+05) . a code between RM(1) and RM(2). That is, given signal s of length N, 
we find a list that is a superset of all Hankel codewords tp with | (s, <p) | 2 > (1/fe) ||s|| , in time 
poly(fc, log(iV)). We then turn our attention to the widely-studied Kerdock codes. We give a new 
and simple formulation of a known Kerdock code as a subcode of the Hankel code. We then get two 
immediate corollaries. First, our new Hankel list-decoding algorithm covers subcodes, including the 
new Kerdock construction, so we can list-decode Kerdock, too. Furthermore, exploiting the fact 
that dot products of distinct Kerdock vectors have small magnitude, we get a quick algorithm for 
finding a sparse Kerdock approximation. That is, for k small compared with 1/VyV and for e > 0, 
we find, in time poly(fc log(iV)/e), a fc-Kerdock-term approximation s to s with Euclidean error at 
most the factor (1 + e + 0(k 2 / \fN)) times that of the best such approximation. 

1. Introduction 

Coding theory and computation have enjoyed a long and fruitful interaction. Decoding a re- 
ceived codeword is inherently an algorithmic problem and, conversely, codes have been used as 
key components of algorithms for many purposes, including pseudorandomness, probabilistically 
checkable proofs, learning, and cryptography The computational view of codes can also provide 
important insights for coding theory and code construction. See |Sudl ISudOlj and the references 
therein for a sample of this fruitful interaction. 

Because decoding is inherently an algorithmic problem, it is natural to analyze the computational 
cost of decoding a received codeword. We can quantify how much time and space we need to decode 
a vector which has been corrupted according to a variety of noise models. In this paper, we are 
interested in how many samples of the received codeword are necessary for decoding, how much 
noise we can tolerate in the input, and how quickly we can decode using just a few random samples 
in the presence of this noise. 

The first- and second-order binary Reed-Muller codes RM(1) and RM(2) are fundamental in the 
study of codes and their applications to algorithms. A RM(1) codeword of dimension n can be 
regarded as a binary linear function on n variables and a RM(2) codeword is a quadratic function 
on n variables. As such, they are fundamental expressive classes, used in proofs and learning as 
well as error-free communication. 
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Binary RM(1), in particular, admits highly efficient algorithms for decoding, even in the presence 
of noise. We are interested in a form of decoding that has appeared many times before with 
various names, and that we call Euclidean List Decoding. The first quick algorithms for Euclidean 
list decoding of RM(1) are in |GL891 IKM91j . Given a (multiplicatively- written) linear function 
/ : Z T 2 — > (±1,-), one can recover / by querying its value on just poly(n) values of its graph, 
instead of all 2 n values. Furthermore, the decoding succeeds even in the presence of a lot of noise; 
i.e., if the noise v is orthogonal to the signal / and if we assume only that ||/|| 2 > (l//c)||^|| 2 , 
then the algorithm on / + v takes time polynomial in kn and returns a (short) list of possible 
/'s. See ISudOOj for a discussion of list decoding algorithms and their applications. We note that 
this problem can be solved more generally using nearest-neighbor data structures |Ind00j . but the 
general solution requires space and preprocessing time N = 2 n , which we want to avoid. 

While the available techniques for RM(1) make it useful in many applications, RM(1) is limited 
in several important ways compared with RM(2). First, there are only 2 n RM(1) codewords, while 
there are approximately 2 n I 2 RM(2) codewords, so, quantitatively, RM(2) is more expressive. 
But there are important structural differences, as well. When used to express a concept or to 
code a computation, an RM(1) codeword as a function considers its variables one at a time, while 
RM(2) codewords consider their variables in pairs. First-order Reed-Muller codewords form an 
orthonormal basis, while RM(2) forms a highly redundant dictionary — a collection of more than 
./V vectors spanning a vector space of dimension N — that is potentially much more useful for lossy 
compression. When used as a pseudorandom number generator, RM(1) provides a family of 3-wise 
independent random variables and RM(2) provides a family of 7-wise random variables. 1 Because 
of the extra expressiveness of RM(2), however, many tools from the first order theory do not apply. 
For example, we do not know how to recover a RM(2) vector in the presence of noise unless the 
noise is slight AKK + 03| . 

In this paper, we take the first steps toward extending the decoding tools of RM(1) into the 
realm of quadratic binary (and, equivalently, Z4) codes. We show how to recover Hankel code- 
words |CGL + 05] efficiently in the presence of noise, giving a result analogous to what one can 
do with RM(1) up to a polynomial in the parameters. The Hankel code is the union of cosets 
of RM(1), i.e., (J g Q </3RM(l) for some Q of size q, so that Hankel can be regarded as the union 
of q orthonormal bases, each equivalent to RM(1). It follows immediately that one can use the 
KM91 algorithm q times to do list-decoding over the union of q equivalent copies of RM(1), but 
only at time cost q times the cost of one instance of the algorithm in KM91 . Hankel consists of 
q = <d(N 2 ) copies of RM(1), however, so the cost of such a trivial algorithm would be prohibitive. 
By contrast, we list-decode Hankel in total time poly(fc, log(iV)). Such efficient list-decoding is 
possible only by confluence of the choice of dictionary (Hankel) and the algorithm, and represents 
an important way in which our contribution is significant. 

We also give a new, simple construction of a code in the well-studied class of Kerdock codes. Our 
Kerdock construction 1C is a subcode of the Hankel code TC, which implies immediately that our 
Hankel list-decoding algorithm applies also to our Kerdock construction. Thus we have RM(1)C 
K C TC CRM(2). While Kerdock and Hankel are still in some important respects more limited 
than RM(2), they are great improvements over RM(1). For example, a random codeword from 
a Kerdock code (and, therefore, from the Hankel code) provides a family of 5-wise independent 
random variables. Each Kerdock code has N 2 vectors and the Hankel code has 0(iV 3 ) vectors, 
compared with Q(N) for RM(1) and 2®( log for RM(2). Kerdock represents a substantial, 



^That is, if we fix any three indices 2/1,3/2,1/3 into an unknown codeword <p and then choose an RM(1) codeword 
(p at random, the random variables (f(yi), ^(2/2), f{yz) are jointly independent. If we choose an RM(2) codeword at 
random, any 7 positions are independent. 
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well-studied family of quadratic functions with advantages over RM(1) in the areas of coding 
theory [HKC + 94j . radar signaling HCM06], and spread-spectrum communication. 

Finally, the previous work in iTGMS03, GMS03J demonstrates that we can use a fast list-decoding 
algorithm for to find a sparse representation efficiently. That is, exploiting the fact that dot products 
of distinct Kerdock vectors have small magnitude, we get a quick algorithm for finding a sparse 
Kerdock approximation. More specifically, for any k < l/(6\/iV) and any e > 0, we can find, in 
time poly(fe, log(iV), 1/e), a fc-Kerdock-term approximation sto s with Euclidean error at most the 
factor (1 + e + 0(k 2 /y/~N)) times that of the best such approximation. 

This paper is organized as follows. In Section we give preliminaries about finite fields, Reed- 
Muller codes, and Kerdock codes. We also include a discussion of related work. In Sectional we 
give a new, computational construction of a Kerdock code, as a subcode of the Hankel code. In 
Section we give our algorithm for fast list decoding of the Hankel code. In Section we give 
corollaries of our main result concerning list-decoding and sparse recovery of Kerdock codes, as well 
as indications about directions for improvement. 

2. Preliminaries 

2.1. Finite fields. To outline the setting in which Kerdock codes are defined, we begin with the 
definition of finite fields and the algebra we perform over these fields. Let h(t) be a polynomial of 
degree n over Z2 that is primitive, i.e., h(t) does not divide t k — 1 for any k < 2 n — 1. Because h 
is a primitive (and hence, irreducible) polynomial, it has no non-trivial factorization. 

The ring of polynomials modulo h, 7,2[t]/h, forms a field of 2 n elements. We denote this 
field F(2 ra ). The polynomial = t is a (multiplicative) generator of the field; thus, the set 
{1,£,£ 2 , . . . , £ 2 _1 } enumerates the non-zero elements of the field. Additively, the field F(2 n ) is a 
vector space ZJ> over Z2 of dimension n with basis {1,£,£ 2 , . . . ,^ n ~ 1 }. It is also a quotient vector 
space of Z n . When we want to emphasize the vector formulation of a field element a, we write 
[a] for a column vector. Thus [1], [£], [£ 2 ], . . . , [£ n_1 ] are the canonical basis vectors. Below, we 
will often want to consider these {0, l}-valued vectors to be in Z^ZJ, or Z n for the purposes of 
dot products. We will write, e.g., iM r< 3b]+ 2 ^ T M ; where y is a field element, Q is a {0, 1}- valued 
matrix, and £ is a {0, l}-valued vector. Note that all the arithmetic in the exponent can be done 
over Z, where [y] is a {0, l}-valued vector. Since the exponent is an exponent of i, arithmetic can 
equivalently be done mod 4. Finally, since 2 multiplies £ T [y], the dot product of t and [y] can 
be performed mod 2. For any x G F(2 n ), we have x 2 ™ = x so that ^fx = x 2 " . Because 2 is 
congruent to mod 2, we have (x + y) 2 = x 2 + y 2 for any x, y G F(2 n ) and, by repeated squaring, 
(x + y) 23 = x 2J + y 2J . 

The trace of an element x G F(2 n ) is an important quantity we use in defining and constructing 
Kerdock codes. 

Definition 1. The trace of x G F(2 n ), Tr(x), is defined to be 

Tr(x) = x 23 = x + x 2 + • • • + x 2 

0<j<n 

The following lemma gives the properties we need of the trace map. We give the simple proof 
for completeness. 

Lemma 2. We have 

• For x, y G F(2 n ) and a, b G F(2), we have Tr(ax + by) = aTr(x) + bTv(y). 

• The image of Tr is in Z2 . 

• The trace is not identically 0. 
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Proof. (Repeated) squaring of an element is a linear operator, so Tr(ax + by) = aTr(x) + 6Tr(y). 

Again by linearity of squaring, Tr(x 2 ) = Tr(a;) 2 . Since x 2 = x, we have Tr(a;) = Tr(x 2 ) = Tr(x) 2 . 
Thus Tr(x) satisfies y = y 2 , whence Tr(x) £ F(2). For n odd, TV(1) = 1. fLemma 1111 shows that 
Tr ^ for even n, as well.) ■ 

Thus Tr is an additive homomorphism from the big field F(2 n ) to the prime subfield F(2) = Z2, so 
Tr(x) = for exactly half of the field elements. It is not necessarily true that Tr(xy) = Tr(x)Tr(y). 
Finally, note that Tr(l) is or 1 if n is even or odd, respectively. 

2.2. Definitions of RM(l,n) and RM(2,ti). We review the definitions of the two codes, first- 
and second-order Reed-Muller codes (RM(l,ra) and RM(2,n), respectively), which sandwich Ker- 
dock codes. Fix a parameter n. 

Definition 3. Let £ £ ZJJ be a binary vector of length n and let e £ Z2. The first-order Reed-Muller 
code RM(l,n) of length N = 2 n is defined as a set of vectors V£ t€ indexed by I and e. For each code 
word V£ >e at position [y] £ Z?? is given by 

v e 4[y\) = 2{i T [y]+e) mod 4. 

The exponentiated form of RM(l,n) is given by 



'N VN 

We normalize the codevectors by y/N in the exponentiated form to obtain unit vectors. 

Definition 4. Let Q be an nx n symmetric matrix over 7L<i, let I £ Z?> be a binary vector of length 
n, and let e £ Z4. The second-order Reed-Muller code RM(2, n) of length N = 2 n is defined as a 
set of vectors indexed by Q, t, and e. Each codeword wq^ € at position [y] £ ZJ, is given by 

wquW) = ([yfQiy] + ^ T \v\ + <0 mod 4. 

The exponentiated form of RM(2,n) is given by 

wAv\) = ^ [y]TQ[y]+2eT[y]+t - 



N 

Below, we will sometimes abbreviate the index (Q,£,e) as A, so that <f>Qi e = fx- Again, we 
normalize the codevectors in the exponentiated form so they are unit vectors. Observe that if 
Q = 0, then the subset of RM(2,n) codewords given by are, in fact, RM(l,n) codewords. 

We frequently drop the index e since i e represents a unit factor that can be absorbed into a more 
general coefficient cq^ of CQ t £<pQ g. 

In other literature, both RM(1) and RM(2) are presented as binary codes. Our theory can be 
formulated for both Z2 and Z4, but we stick to Z4 after giving the equivalence between previous 
work and ours. We will consider RM(1) and RM(2) over Z4, as above, since the Kerdock codes are 
most natural over Z4 — they are nonlinear binary codes but linear over Z4. 

We say that a code with entries in Z4 is a Z4-code while one with entries in Z2 is a Z2-code. The 
two previous definitions of RM(l,n) and RM(2,n) both result in Z4-codes. The Z2 Reed-Muller 
codes may be more familiar to the reader and we often want to relate a Z4-code to a Z2-code. We 
do so via the Gray map. 

Definition 5. The Gray map, gr : Z4 — > Z 2 , is given by 



gr(0) = 00, gr(l) = 01, gr(2) = 11 and gr(3) = 10. 
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We sometimes use the exponential version, from {±l,±i} to (±1) 2 , given by 

gr(+l) = (+l,+l), gr(+i) = (+l,-l) J gr(-l) = (-l,-l), and gr(-i) = (-1, +1). 

Further overloading notation, gr : Z± — * Z^* 2 is gotten by applying the Gray map to each of N 
elements in a vector in Z^, getting N elements in Z|, and similarly for the exponential versions. 

Equivalently, one can transform Q, a Z4- valued quadratic form on Zj, to M, a Z2-valued qua- 
dratic form on Z^ +1 . The quadratic form Q is an n x n binary symmetric matrix while M is an 
(n + 1) x (n + 1) binary skew symmetric matrix — that is, M has zero diagonal. Let the row vector 
dq be the diagonal of Q. Then Calderbank et al. )CCKS97j show that the correspondence between 
binary symmetric matrices Q and binary skew symmetric matrices M is given by 

M = ( ° A § \ , (1) 
\d Q d Q d T Q +Q)> v ' 

where the "extra" bit in the top row and left column is used as an index into the two outputs of 
the Gray map. This correspondence is not linear but it is rank preserving in the sense that if M 
has rank n + 1 — 2j then Q has rank n + 1 — 2 j or n — 2j for any integer j, < j < (n— l)/2. 

In summary, the following commutative diagram relates codewords and codeword labels in the 
Z2 and Z4 formulations: 

GJ 

Z4 label ► Z2 label 



£4 codeword > L2 codeword 

The following theorem of Calderbank et al. CCKS97 relates the rank of the binary symmetric 
matrices Q\ and Q2 to the magnitude of the dot product between two codewords generated with 
the respective matrices. 

Theorem 6. Let Q\ and Q2 be binary symmetric n x n matrices and let <PQ 1 £ lt e 1 and ( PQ 2 £2,^2 ^ e 
distinct exponentiated 7L^-RM(2,n) codewords. If Rank(Q\ — Q2) = R, then 

\{ < PQi,ei,ei' c PQa,i2,ea)\ £ {o,2 _ii/2 |. 
In particular, if l\ = £2, then the magnitude of the dot product is 2~ R / 2 . 

2.3. Definition of Kerdock codes. A Kerdock code is associated with a Kerdock set of matrices. 
The definition of the latter is non-constructive. 

Definition 7. A Kerdock set /C is a set ofnxn binary symmetric matrices, including zero, of size 
n such that for any distinct P\, P2 6 JC, the rank of (Pi + P2) over F(2) is n. 

In particular, any non-zero P G tC has full rank. We take these matrices P to be quadratic forms 
over Z4. Each Kerdock set has size at most N = 2 n , since distinct elements of a Kerdock set must 
have distinct top rows. In fact, Kerdock sets can achieve maximal size (see below). 

Definition 8. A Kerdock code K(n) of length N = 2 n is defined as a set of vectors cp t £ e> indexed 
by P, £, and e. Each codeword cpe e at position [y] £ Zg is given by 

c P> U[v\) = ([yfP[y] + t T ly] + <0 mod 4 



where P G K comes from a Kerdock set. 
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2.4. Related Work. The work most closely related to our decoding algorithm is that of [GL89 ( 
KM91 , which was alread y discusse d. Similar sparse decoding of the Fourier basis (over Zjv, not 
Zg) was given in |Man951 fGGI + 02l IAGS03 . Other work on local testing of codes [KL05| focuses 
on limiting the number of samples, but not the runtime. Still other work on list-decoding of Reed- 
Muller codes |Sud01| focuses on large alphabets, whereas we work over Z2. The problem of testing 
low-degree polynomials AS97 is different from decoding, which is what we do for special quadratic 
polynomials. We note that |AKK + 03] . in addition to giving lower bounds on the number of samples 
for testing binary Reed-Muller codes, also give a decoding algorithm for a single Reed-Muller vector 
in the presence of very small noise. 

As for construction of Kerdock codes, the history is as follows. Kerdock codes were first de- 
fined |MS77j non-constructively in terms of the allowable quadratic forms. Later, in the break- 
through paper CCKS97 , the authors give algebraic constructions of Kerdock codes that provide 
a rich set of symmetries, but the algebra included theory somewhat beyond finite fields. Indepen- 
dent of and somewhat earlier than the publication of our work, a construction of a Kerdock code 
similar to ours is given in HSP06 . Both the constr uction in jHSPOfij and our construction here 
are isomorphic, in some sense, to a construction in ('(.'KS97 . We believe our construction is a 
bit simpler than HSP06 — indeed, to get the Hankel structure we need here, it is simpler for us to 
give Definition El (below) from scratch than to adapt the construction in |HSP06| . As additional 
value beyond (HSPOGj . we also contribute a self-contained proof of correctness of the construction, 
simplifying the proof in CCKS97 ( HSP06 gives offers no new proof of correctness). We also give 
an important new characterization of the construction, Lemma ED 



In this section, we present a construction of Kerdock codes. 

3.1. Kerdock matrices. In each construct, a Kerdock code of length N = 2 n is a subset of 
RM(2,n) code {<PQ,e} satisfying an appropriate restriction on the binary symmetric matrix Q. We 
call these matrices Kerdock matrices. Roughly speaking, they are a restricted set of binary Hankel 2 
matrices where the top row of the Hankel matrix consists of arbitrary entries and each of the 
remaining reverse diagonals is gotten from a fixed linear combination of the previous n reverse 
diagonals. 

Let h(t) = ho + hit + • • • h n -it n + t n be a primitive polynomial over Z2 of degree n. The 
coefficients of this polynomial are the coefficients in our fixed linear mapping. 

Definition 9. An n x n linear- feedback- Kerdock matrix (briefly, If-Kerdock matrix) is a Hankel 
matrix where the top row of the matrix ao, a±, . . . , a n _i consists of n arbitrary values in Z2 and the 
jth reverse diagonal parameter for j > n is a fixed linear combination of the previous n reverse 
diagonal parameters, given by 



(See Section 13.21 for an example.) We denote by K, the set of lf-Kerdock matrices. Next, we 
provide what turns out to be an equivalent definition of lf-Kerdock matrices, called trace-Kerdock 
matrices. 

Definition 10. An n-by-n trace-Kerdock matrix K a is the matrix whose (j, k) position is Tr(a^ +k ) 
for some a in F(2 n ). 



3. New definition of Kerdock codes 




0<£<n 



A Hankel matrix is constant along reverse diagonals. 
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Note that the set of trace-Kerdock matrices is ^-linear, meaning the sum (mod 2) of two 
trace-Kerdock matrices is itself a trace-Kerdock matrix, by additivity of the trace. Kerdock codes, 
however, are Z4-linear but not ^-linear. 

Lemma 11. Trace-Kerdock matrices have full rank. 

Proof. Given trace-Kerdock matrix K a for a 7^ 0, regard it as a matrix over F(2 n ). Since K a 
consists of 0's and l's, the determinant of K a over F(2 ra ) is the same as the determinant over Z2. 
The matrix K a factors as K a = V T D a V, where 



/ 1 




e 


e ■ 


£71— 1 


\ 


1 


e 


e 


e ■ 


£2(n-l) 




1 


e 


e 


e 2 


£4(n-l) 




1 


e 


e k - 2 


e k - 3 ■ 






V 1 








^2"- 1 -(ti-1) 


/ 



is vandermonde and D a = diag(a, a 2 , a 4 , a 8 , ... , a 2 " X ). Over the big field, 

det(£> Q ) = a i+2+4+-+2»-i = a 2»-i = 1 

and the vandermonde parameters ^,^ 2 ,^ 4 ,... are distinct, so V is non-singular. It follows that 
det(K a ) / over the field, so det(K a ) = 1. ■ 

Note that the factorization K a = V T D a V also shows that K x JK y J = K xy J, where J = 
(V T V)' 1 = R- 1 . Thus the map x 1 — ^ K X J is a non-trivial multiplicative homomorphism from 
field elements to matrices. Since squaring is linear, x 1— > D x and, so, x ^ K X J = V T D x {V T )~ l are 
additive homomorphisms. It follows that x i-> K X J is a field homomorphism, so, in conjunction 
with the matrix J, the trace-Kerdock matrices can be regarded as field elements. 

Lemma 12. Every trace-Kerdock matrix is a If -Kerdock matrix. 

Proof. Fix a trace-Kerdock matrix K a . It is Hankel by inspection, since the matrix entry K a (j, k) = 
Tr(a^ +k ) depends only on j + k. Note that the first n diagonal parameters are given by 

Tr(a), Tr«), Tr« 2 ), . . . , Tr(a£™ -1 ). 

Fix j and k with j < n,k < n, and j + k > n. Then the j + k reverse diagonal of K a , which can 
be taken mod 2, is [£ J ] T i£a[£ fe ] = Tr(a^ J+fc ). Using additivity of the trace, 

Tr(a^' +fe ) = Tx(ae +k ~ n e) 

V £<n / 

Kn 

That is, the (j + /c)'th reverse diagonal depends linearly on the previous n, for j + k > n. ■ 
Lemma 13. Every If-Kerdock matrix is trace-Kerdock. 

Proof. There are 2 n lf-Kerdock matrices since the top row of n bits enumerates Z^. There are 2 n 
trace-Kerdock matrices K a since the top-left entry in D a = (V T )~ 1 K Q V~ l enumerates F(2 n ). So 
there are equal numbers of lf-Kerdock and trace-Kerdock matrices. Above we showed that every 
trace-Kerdock is a lf-Kerdock. Our statement follows. ■ 
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Thus we have 



Theorem 14. The set of If-Kerdock matrices is a maximal If-Kerdock set. 

Henceforth, we refer to lf-Kerdock and trace-Kerdock matrices as "Kerdock matrices." As above, 
a Kerdock code is defined from a Kerdock set K, as {ippe '■ P € IC, £ £ 2^2 }• 

3.2. Example Kerdock matrix construction. Let n = 3. The polynomial hit) = 1 + t 2 + t 3 = 
ho + hit 1 + 1 3 is a primitive polynomial over Z2 of degree 3. A 3 x 3 Kerdock matrix 

oq ai a2 N 

a 1 a 2 a 3 

K a2 a 3 04 / 

has five reverse diagonal parameters, ao, • • • ,04- We construct P G /C by choosing the top row 

(ao ai 02) arbitrarily, e.g., (ao ai 02) = (l 1 l). The two remaining reverse diagonals 03 
and 04 are given by 

«3 = «o + 0-2 = 1 + 1 = and 04 = a\ + 03 = 1 + = 1. 
This results in the matrix 

P = 




3.3. Properties of Kerdock codes. We now give a lemma that will be useful in Section as 
well as in its own right. 

Lemma 15. Fix a primitive polynomial h for defining a finite field and for the Kerdock properties. 
Let P be a symmetric matrix. The following are equivalent: 

• P is Kerdock; 

• For all r and s, we have [r] T P[s] = [\/rs] T P[y/rs\ mod 2; 

• For all x,y and z we have [x] T P[yz] = [xy] T P[z] mod 2. 

Proof. First we show that the two algebraic statements are equivalent. Suppose [x] T -P[yz] = 
[xy] T P[^] holds for all x, y, and z. Then, given non-zero r and s, put x = r,y = y/s/r, and 
z = y/rs\ it follows that [r] T P[s] = [^rs\ T P[^/rs\. Conversely, if [r] T P[s] = [y / rs] T P[y / rs] for all 
r and s, then, given x,y,z, we have [x] T P[yz] = [yJxyz] T P[^/xyz] = [xy] T P[z], first putting r = x 
and s = yz and then putting r = xy and s = z. 

Now, suppose P is Kerdock and fix x, y, and z. By linearity, it suffices to consider x = £ J 
and z = £ k , for < j, k < n. Because £ is a multiplicative generator, it suffices to consider 
y = £. If j < n — 1 and k < n — 1, then [x] T P[yz] = [xy] T P[z] follows from Hankelness. If 
j < n — 1 and k = n — 1, then = [£ J ' +1 ] T -P[£ n ~ 1 ]. By the linear feedback Kerdock 

property, this equals Yle< n hi[^] T P[^]- By linearity, this is [^] T P[J2e< n ^eC ]■ By definition of h, 
this is [C^^PIC™] = as desired. A similar analysis holds if j = n — 1 and k < n — 1. The 

case j = k = n — 1 follows from symmetry of P. 

Conversely, suppose [x] T P[yz] = [xy] T P[z] for all x,y, and z. Consider the j'th row of P, for 
j > 0. We want to show that it is gotten by shifting the j — l'st row to the left and setting the 
rightmost entry of the j'th row to the appropriate linear combination of the items in the j — l'st 
row. Put x = y = £, and z = £ k . Then, for k < n— 1, Hankelness (and, therefore, the statement) 
follows immediately. For k = n — 1, we have yz = £ n = ^2t <n hi£ , and the statement follows by 
additivity of the trace. ■ 
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3.4. Kerdock and random variables of limited independence. We first give a definition of 
limited independence for a family of random variables. This is satisfied by the positions in a random 
Kerdock codeword. 

Definition 16. A %^-code is 3.5-wise independent if the distribution on any three positions of 
a random codeword is uniform on (Z4) 3 and, conditioned on any three positions, for any fourth 
position X, we have Vi(X = 0) = ¥r(X = 2) and Vi(X = 1) = Pr(X = 3). 

For example, in our construction, the joint distribution is uniformly random conditioned on the 
sum of the four positions being or 2 mod 4. 

The notion of 3.5-wise independence is useful because it can substitute for 4-wise independence 
in some cases, even when 3-wise independence cannot. Consider the exponentiated version of 
of a 3.5-wise independent family, so each value is ±1 or dbi. Then any four random variables 
W, X,Y, Z satisfy that W,X,Y are independent and, conditioned on W,X, Y, the expectation of 
Z is 0, because Z = ±1 uniformly conditioned on Z real and Z = ±i uniformly conditioned on Z 
imaginary; the probability that Z is real is arbitrary. Thus, in a family of N 3.5-wise independent 
random variables, the first four moments agree (up to constant factors) with the moments of a 
truly random family. We have E[\Y,Xj\ k ] = ®{N k / 2 ) for k = 0,2,4 and E[(J2Xj) k } = for 
k = 1,3. For the 3-wise independent family RM(1), we have E[\ ^2Xj\ 4 ] = Q(N 3 ), since, for any 
triple W, X, Y of variables, there is exactly one fourth variable Z such that WXYZ = 1 and all 
other 4-tuples have zero expectation. 

The following had been known, but not previously presented in terms of 3.5-wise independence. 

Lemma 17. For odd n > 3, the Kerdock code is 3.5-wise independent but not 4-wise independent. 
For even n > 3, the Kerdock code is 3-wise but not 3.5-wise independent. 

Lemma 18. For any n > 3, the %2 code of Gray-mapped Kerdocks is 4-wise independent. 



In this section we show how to perform quick list-decoding of the Hankel code. That is, we are 
given chosen-sampling access to a signal s and a parameter k < N c for some small c > 0(1) ; our 
goal is to find, with high probability, all Hankel codewords <pp£ such that | (<pp£, s) \ 2 > (1/k) \\s\\ , 
in time poly(fc log(iV)). 

Our algorithm is a straightforward generalization of the algorithm of KM91 . We do not give all 
the details of this algorithm; instead, we refer the reader to |KM91j . Loosly speaking, the algorithm 
in |KM91j finds I for which (pn has large dot product with s by maintaining a set of candidates for 
the first j bits of t. For j = 1, 2, 3, . . . , n, the algorithm extends each candidate from j — 1 to j bits 
in all (two) ways, then tests each new candidate. The tests insure that the number of candidates 
remains bounded, so the algorithm remains efficient. 

Our algorithm will attempt to find first the P matrix of each vector ippt, with | (<pp t £,s) \ 2 > 
(1/k) \\s\\ 2 . We will call such P and such ippj. heavy for s. Then the algorithm will find the t part 
by demodulating out the contribution of P, and using the algorithm in KM91j to look for heavy 
RM(1) vectors for s(pp , where <p* = \/(N<p) is the componentwise complex conjugate of <p. This 



strategy relies on the fact that, up to normalization, {<pp t £,s) = yfo,£j s( P*p 0/ > so 'Po/ is heavy for 
s(p* P0 when ipp^ is heavy for s. 

To find P, our algorithm follows the overall structure of |KM91j . For j < n, we will maintain 
a set of candidates for the upper-left j-by-j submatrix of P. The candidates will all be Hankel. 
For each candidate, we will consider extending it to a (j + l)-by-(j + 1) Hankel matrix, in one of 
4 possible ways. We then test each extended candidate in such a way that, with high probability, 
all true candidates are kept (no false negatives) but the total number of candidates kept is small 
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4. Fast List Decoding of Kerdock and Hankel Codes 



enough that our algorithm is efficient. (We describe the retention criterion and test in more detail 
below.) Much of the |KM91j algorithm works unchanged in our context; we give few comments on 
those aspects and instead focus on the changes necessary for the Hankel setting and the reasons 
our algorithm works for Hankel but not for RM(2). In particular, the retention criterion and test 
we use are similar to that in [KM 91 and the guarantee of no false negatives is similar; the main 
technical work is showing that there are few (true or false) positives in the new context. That is, 
the analysis is as follows: 

(1) Our algorithm is correct (finds all true candidates), by an analysis similar to |KM91j . 

(2) Our algorithm is efficient: 

• As in ,KM9j, the efficiency of our algorithm reduces to a non-algorithmic and non- 
probabilistic fact about the number of codewords with large dot product to the signal 
and the number of extensions of a (j — l)-by-(j — 1) candidate to a j-by-j candidate. 

• For the Hankel code in particular, we bound the number of codewords with large dot 
products and the number of extensions of a single candidate. This is the only part 
of the proof where we will be formal since this is where our algorithm departs from 
previous work. 

Let us write P >z P if P is a square submatrix of P, consisting of the upper left j-by-j corner of 
P for some j. As in KMQl], we have an ideal testing criterion for submatrices. 

Criterion 19. A testing procedure keeps candidate P iff there exists some n-by-n matrix P >z P 
and some £ £ with |(93p^,s)| 2 > (1/k) \\s\\ 2 . That is, the procedure keeps P iff there exists some 
unit-norm complex number c, some P >: P, and some I with $t,(c(ipp^,s)) > (\/^fk) \\s\\. 

We will gradually rewrite and weaken this criterion in a sequence of variations given below. By 
"weaken," we mean that a "weaker" criterion will keep more matrices than a "stronger" criterion. 
First, for each j, for each string y" of length n — j, and for indeterminate y' G Z^, define the 
restriction (R y //s) by {R y ns){y') = s(y'y"). (Note that, if \\tp\\ = 1, then R y mp is not a unit vector. 

We have ||iV<H| 2 = 2j ~ n -) 

Because \(fp/\ is constant, if (Ryinpp^, R y "s} is large, then there must be many (small) contri- 
butions. Formally: 

Lemma 20. Suppose \<p\ = 2~ n l 2 , ||s|| = 1, and \(<p, s)\ > \J\jk. Then, for each j, there are at 
least 2 n -i/(4k) of y" £ such that \{Ry»ip, Ry»s)\ > (1/V4k)2 j ~ n . 

Proof. Suppose not. Let ip be ip restricted to the y = y'y" with | (R y mp, R y /is) | > (1/V4k)2 : '~ n , 
so the support of ip has size less than (2 n /(4/c)), and so H^H 2 < l/(4/c). Then the at-most-2 n_: ' 
possible (y")'s with | (Ryiiip, R y ns^ | < {l/^f4k)2 : i~ n contribute a total of at most 1/y/Ak toward 
i.e., \((p — ip,s)\ < l/\/4fc. It follows that 

I to*} I < \(4>,s)\ + \&-4>,s)\ 

< \\ip\\ \\s\\ + l/VIk 

< l/V4k + l/VIk 

= i/Vk, 

a contradiction. ■ 

Thus we can weaken Criterion 1191 to: 

Criterion 21. A testing procedure keeps candidate P iff there exists some n-by-n matrix P y P, a 
unit-magnitude complex number c, and some f G such that, for at least 2 n ~ 3 /(4fc) of y" G Z 2 -J 
we have K (c(R y n<p P j, Ry»s)) > (l/(V^k))2 j ~ n \\s\\. 
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Next, weaken Criterion 1211 to 

Criterion 22. A testing procedure keeps candidate P iff there exists some n-by-n matrix P y P 
and, for at least 2^^ /(4k) of y" G Zj? ~' J there exists some unit-norm c y » and some l y n G Zg - "' with 

5ft (c y » /Ryn<pp i £ vll ,R y ns\] > ( 1/ (x/Ifc) ) V ~ n 

Next, we will show that we need not search over all possible P y P; a single fixed extension P' 
will suffice. (For example, P' might extend P with zeros; P' need not even be Hankel.) We will 
use the notation P' in the sequel. 

Lemma 23. Fix parameter n, j < n, j-by-j matrix P, extension P' y P, and y" G Z™ _J . Then 

{Ry»(pp,e : P y P,£ G Z£} = {Ryuvp,^ : £ G Z£}. 

Proof. Write an extension P to P as 

51) 

and write £ T = (^fl^f), where l\ G Z^. Then, at y = y'y", we have 

y T Py + 2fy = tffPj/ + 2(y") T P x y' + (/) T P 2 / + 2l\y> + 2l\y" 

= (y'fPy' + 2((y") T P 1 + l\)y> + ((y") T P 2 y" + 2l\y"). 

If we fix y" but let £ vary, the expression 2((y") T P\ + £f) varies over all of 2Z^, whether or not 
we let P\ vary. Similarly, if we fix y" but let the coefficient c vary, the expression ci( y "^ T P2V " +2f % y " 
varies over unit-norm complex numbers, whether or not we let P2 (and £2) vary. ■ 

It follows that we can rewrite Criterion 1211 as 

Criterion 24. A testing procedure keeps candidate P iff for at least 2 n ~ 3 /(4k) of y" G Zg there 
exists some unit-norm c y n and some £ y n G Z^ - "* with 

K (ty, (RynVp^^Ryns)) > (l/(VAk))2 j - n \\s\\ . 

Finally, we will not be able to compute the test exactly, but we will approximate with samples. 
To that end, we need to have two thresholds, with a gap. Formally, we want the following criterion, 
in which both the first and third cases represent a weakening, compared with Criterion 1241 

Criterion 25. A testing procedure of a j-by-j Hankel matrix P and signal s with parameters c\ 
and C2 (determined below) behaves as follows. 

• If for at least 2 n ~i /(4k) of y" 6Zj 3 there exists some unit-norm c y " and some £ y n 6Zj 3 
with K \Cyff (Ryitipp, g /f , R y ffs\\ > ( 1 / ( V4k ) ) 2 J ~ n 1 1 s 1 1 , the procedure keeps P with high 
probability. 

• If only for less than c\2 n ~ 3 /{4k) ofy" G Z^ - "* does there exist some unit-norm c y " and some 
£ y ii G Z2 with 5ft (c y rr (^R y "fp, e ^jRy/is^ > (l/(c2V / 4fc))2- ?_n \\s\\, the procedure drops P 

with high probability. 

• (The procedure may behave arbitrarily, otherwise.) 

Our algorithm will also need an estimate for ||s||. Here we simply assume that ||s|| is known, say, 
up to the factor 2. Alternatively, one might assume the dynamic range of the problem is bounded, 
i.e., that 1/M < ||s|| < M for some known M. The algorithm could then try all 0(log(M)) possible 
2 3 in the range 1/M to M; one of them is a factor-2 approximation to ||s||. This leads to an extra 
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factor of log(M) in some costs. One can also get an appropriate approximation to 
to s without an assumption about the dynamic range. We omit details; see par 



sll from samples 



02 



We use the following straightforward efficient sampling algorithm to implement Criterion 1251 for 
which there exist suitable c\ and ci'- 

Algorithm 26. Assuming ||s|| is known to within a constant factor (that is absorbed into C2): 

< (40/c/ci)2^- n || s || 2 , we can use the |KM91I algo- 



For each y" 6 Zj J such that 
rithm to determine whether 



I Ry" S I 



and c v " for Criterion 1251 exist. 



To determine whether at least 2 n J /(4k) or at most c\2 n J /(4k) of the y" £ 2^ J satisfy 
our condition, sample approximately k/c\ of the y'"s; repeat to drive down failure prob- 
ability. Note that there are at most (c\ / W)2 n ~ :) / (4k) possible (y")'s where 



l Ryll S j 



(40k/ci)2i- n \\s\\. The algorithm can behave arbitrarily on these y" and still distinguish 
"at most (ci)2 n "V(4A:)" from "at least 2 n ~i /(4k)." 



In summary, the following is a direct generalization of previous work on RM(1) (e.g., .KM91 ) 
concerning false negatives, for which there is nothing special about RM(1) or Hankel: 



keeps, with high probabil- 
n with 



Proposition 27. Fix parameter n and signal s of length N = 2 n . 

• For any k and any j < n, any procedure satisfying Criterion 
ity, all j-by-j Hankel matrices P for which there is some P y P and some t S ^ 2 

|( s ,^ A )| 2 >(iA)|| s ||L 

• Alaorith,m\2(A satisfies Criterion\25[ 

• Alaorithm \2nA runs in time poly(/c log(iV)). 

Thus we have shown that each call to Algorithm 1261 to test a single candidate, is efficient. We 
will call Algorithm 1261 on many candidates as follows. 

Algorithm 28. Start with the exhaustive candidate set C\ for 1-by-l matrices P. For j increasing 
from 1 to 71 — 1, extend all candidates in Cj to (j + l)-by-(j + 1) Hankel matrices in all possible 
ways. Call Algorithm 1261 to test each candidate extension. I 

It remains to show that the number of candidates P under consideration remains under control. 
Let f(j) denote the number of j-by-j candidates considered. Each candidate will be extended to 
a (j + l)-by-(j + 1) Hankel matrix in all possible ways, getting g(j + 1) possible (j + l)-by-(j + 1) 
candidates. Then Criterion 1221 will be applied to each candidate, reducing the number of candidates 
from g(j + 1) to f(j + 1). We need to bound both f(j) and g(j). We first bound g(j + 1) by 4/(j): 

Lemma 29. Alaorithm \28\ constructs only four extensions to any candidate. 

Proof. Note that a j-by-j candidate P extends to (j + l)-by-(j + 1) in only four ways, since there 
are only two new possible bits, a and b: 



p 




a 


a 


b 



\ 
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y" - < 2 n ~ j 

// / // / / / / 

/ 

/ 

/ 

(02 //c)-heavy 



Figure 1. Bounding the number of P's that are heavy on many (y")'s. Label 
columns by (y")' s f° r which P^/s < (k/cz){2 3 ~ n ) \\s\\ and label rows by P's; 
put a checkmark at (y",P) if P is (c2/fc)-heavy for y". We will show below that 
there are few checkmarks in any column; it follows that there are few rows with 
checkmarks in many columns. 



Here we crucially use the fact that the candidates are Hankel. If we were to consider arbitrary 
j-by-j RM(2) matrices, the number of extensions would be 2 J , which is prohibitive. 

Thus it suffices to bound f(j) by a polynomial in k, uniformly for all j. So we need to bound the 
number of P's that are (c2 / 7c)-heavy on more than ci(fc)2 n_J jk of the y" £ r L 1 ^~ 3 ', where we call a 



Ry"<£P t i yll ,Ry"S 



> 



candidate P h-heavy on y" if P extends to some P with some £ y " satisfying 

\fh2P~ n [|s||. The other candidates are dropped by our criterion. 

As in previous work, it suffices to bound the number of candidates P for each y" and then 

ii 1 1 2 

do an averaging argument. There are at most {c^/k)2 n ~ :) possible (y )'s for which P y //s > 
{k/c^)2^~ n ||s|| 2 and, for constant C3 related to c\, these (?/")'s can be ignored in determining 
whether P satisfies the condition of Criterion 1221 on at least (l/(4k))2 n ~ J or at most {c\/ {Ak))^ 11 ^ 3 

II 1 1 2 ' 2 

of the (y")'s. So, henceforth, consider only y" for which P^/s < {k/c^)2 3 ~ n \\s\\ . Below, for all 
such y", we will bound, by < poly(/c), the number of P that are (02 / 7c)-heavy on y" . Summing 
over at most 2 n ~ 3 possible (y")'s, there are at most B\.2 n ~ 3 pairs (P,y") where P is (c2 / k)-he&vy 
for y". Thus there can be at most B^ ■ (k/ci) < poly(fc) possible P's that are (c2/A;)-heavy on 
at least (ci/k)2 n ~ J of the (y")' s ; *- e -> a * an Y stage j, there are at most f(j) < poly(fc) possible 
candidates considered by our algorithm. See Figure ^ 

Thus we have, from previous work and without specific consideration of the Hankel code, 

Proposition 30. Fix signal s of length N = 2 n , fix parameter k, and fix j < n. Suppose, for 

1 1 2 1 1 2 

each y" £ Tl^ 3 with \\Ry"s\\ < (k/cs) Py/s , there are at most poly (A;) possible j-by-j Hankel 

matrices P that are (c2/k)-heavy for y" . Then there are at most poly (A;) possible P that are kept 
by our algorithm. 

Finally, we now proceed to Hankel-specific analysis. To simplify notation, and without loss of 
generality, we drop all previous constants. It suffices to show that there are at most poly(A;) Hankel 
matrices P such that there exists an I with | (s, tpp/} | 2 > ||s|| . 

In an orthonormal basis, by the Parseval Equality, there can be at most k vectors 93 with 
I (•Sjty?) | 2 > (1/&) ll s l| 2 - Similarly, in a /i-incoherent dictionary, i.e., a set of vectors with all dot 
product magnitudes bounded above by (i, if \ik is at most some constant C4 ~ 1/6, then there are 
at most 0(k) such A's [TGMS03, GMS03 . Hankel, however, is not a //-incoherent set for small fi, 
because there are pairs P and P' of Hankel matrices that differ by a low-rank matrix, whence the 
corresponding vectors (fp t o and (fip'fl have large dot product. Nevertheless, we show that, for each 
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P, the number of P' such that P + P' has low rank is small. We then show that the set of Hankel 
codewords works like an orthonormal basis or an incoherent set, in the sense that there may be at 
most poly(fc) Hankel A's with | (s,(p\) | 2 > (1/k) \\s\\ 2 . 

We now proceed formally. This proceeds in a sequence of lemmas along with Dickson's Theorem 
(Theorem EJ) , all of which have proofs that are elementary or found in existing work. 

Lemma 31. There is a constant C4 such that, for any incoherence parameter n, < fj, < 1 any 
k, and any signal s, if [ik < C4, there are at most 0(k) vectors in any set A such that both of the 
following hold: 

• For all ip 7^ ip E A, we have | (92, cp') \ < /j,. 

• For all tp E A, we have \ (s,p) | 2 > (1/k) \\s\\ . 



Proof. This essentially follows from |T(jMS03l IGMS03) : we include a sketch of the proof with 
possibly different constants. Suppose, toward a contradiction, there are t > 4/c vectors in A; wlog, 
£ = 4k + 1, since we can discard the remaining vectors. We may assume that s = • ajipj lies in 

the span of A = {(fj}. The idea is to show that ||s|| 2 « | a- ^ | 2 and \(s,ipj)\ 2 as \aj\ 2 , so that an 
approximate Parseval equality holds. First, 



a,j>ip 



> 



2 



> yji^i 2 -M(4*+i)yj 



by Cauchy-Schwarz, so that, for some c, 

|aj| 2 < (1 + cfxk) \\s\\ 2 . 



(2) 



On the other hand, for each j, 



\{s><Pj)\ 



i + ^2 a 3' (Vi'tfj) 

3'+3 

< |Oj| +/X \ a j'\ 



< \ aj \ + u 4k^2 \a f \ 2 

V i'H 
= \aj\ + 0(nk)(l/Vk) \\s\\ , 

so that, for some d we hav |oj| > \(s,ipj)\ — 0(fik)(l/y/k) \\s\\, and so 

K| 2 > (1- c'fik) 2 (l/k)\\s\\ 2 . 
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Summing over all £ = 4k + 1 terms, we get 



J2\ a i\ 2 ^ ( 1 -c'vk) 2 {t/k)\\sf , 

3 

so that, with ©, we get (1 - d fj,k) 2 (£/k) < (1 + c(j,k), or i < fc(l + c/Jc)(l - c'fik)- 2 . Thus, if /c^ 
is a sufficiently small constant, we get £ < 4k, a contradiction. ■ 



Lemma 32. For any constant C4, there are just < poly(/c) Hankel matrices of rank at most 
21og(fc/c4). Equivalently, for each Hankel P, there are at most Hankels P' with rank(P + P / ) < 
21og(fc/c 4 ). 

Proof. Suppose Hankel matrix P has rank r. We claim that 0(r) binary parameters determine the 
top half of the matrix (above the main reverse diagonal). Another 0(r) parameters determine the 
bottom half, whence the number of such matrices is 2°^ r >. The result follows. 

Write the (r + l)'st column as a linear combination C of the first r columns. We claim that C and 
the first r entries po,pi, ■ ■ ■ ,p r -i m the top row determine the top half of the matrix. Determine p r 
from po,Pi, ■ ■ ■ ,Pr-i and C applied to the top row (row 0). Then, having determined p r , determine 
Pr+i from C applied to the first r entries in row 1, i.e., Pi,P2, ■ ■ ■ ,Pr- Proceed to determine p r +2 
from C applied to the first r entries in row 2, i.e., P2,P3 ■ ■ ■ ,Pr+i- The general statement follows 
by induction. 

For example, suppose Hankel P has rank three, the first three reverse diagonal parameters are 
a, b, c, and column 3 is the linear combination C of columns 0, 1, 2. Then, in 
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we get d in row 0, column 3 from a,b,c by applying C in row 0. Now knowing d in addition to 
a,b,c, we get e in row 1, column 3 by applying C to b, c, d in row 1. We get / in row 2, column 3 
by applying C to c, d, e, etc. ■ 

In intermediate stages of our algorithm, we need to bound only the number of Hankel matrices 
P that are considered. In the output, however, we need to bound the total number of Hankel 
codewords output, i.e., the number of pairs (P,£). We give the latter stronger statement in this 
summary theorem. 

Theorem 33. For any signal s, there are at most poly(/c) Hankel codewords (pp^ with 

\(s,vp,i)\ 2 > (iA)IMI 2 - 

Proof. Suppose there are at least q Hankel codewords (fpj with | (s, (fp.e) \ 2 > (1/&) . For fixed 
P, the set {(fp t £ ■ £} is an orthonormal basis, so there are at most k possible ts for each P with 
I (s,ipp.i) \ 2 > (l/k) \\s\\ 2 . Thus there are at least q/k matrices P with at least one £ satisfying 
I {s,ipp y i) \ 2 > (l/k) \\s\\ 2 . By Lemma 15*21 there is a set Q of size \Q\ > q/(kLk) matrices P having 
an £ satisfying | (s,ip P/ ) \ 2 > (l/k) \\s\\ 2 and with rank(P + P') > 21og(A;/c4) for all P / P' G Q. By 
Theorem|Sl for any P 7^ P' £ Q and their corresponding £ and £', we have | ((pp,e, tpp'/') \ ^ (c^/k). 
By Lemma EU \Q\ < 0(k). It follows that q < poly(/c). ■ 

In summary, we have our main theorem. 
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Theorem 34. Let {(fx} denote the Hankel code. There is an algorithm that, given parameter k 
and chosen-sampling access to a signal s € C , finds, in time poly (A; log(iV)), a list containing all 
A with | (s,(p x ) | 2 > (1/A;) ||s|| 2 . 

5. Conclusion 

5.1. Corollaries. A list-decoding algorithm for the Hankel code immediately gives a list-decoding 
algorithm for the Kerdock subcode. Since the Kerdock code is (l/V^AO-incoherent, we immediately 
get a sparse recovery algorithm for Kerdock, using [TGMS03, GMS03 . That is: 

Corollary 35. Let {(fx} denote a Kerdock code that is a subset of a Hankel code. There is an 
algorithm that, given parameter k and chosen-sampling access to a signal s £ C N , finds, in time 
poly(fclog(iV)), a list containing all A with \ (s,(f\) | 2 > (1/k) \\s\\ 2 . 

Corollary 36. Let {(fx} denote a Kerdock code that is a subset of a Hankel code. There is an 
algorithm that, given parameters k < l/(6yN) and e > and chosen-sampling access to a signal 
s € C N , finds, in time poly(/c log(iV)/e), a set A of size k and coefficients c\ (i.e., a k-term 
approximation J = J^xeA ^^) w ^ II* ~ S W — U + 6 + k 2 /y/N) \\sj~ — s\\ , where Sk is the best 
k-term Kerdock approximation to s. 

5.2. Improvements. The cost of our Hankel recovery algorithm is polynomial in k, but high. In 
Lemma 021 we show only that there are at most 2 4r Hankel matrices of rank r, whence, for each 
Hankel P, there are at most 2 4r = k 8 Hankel matrices P' ^ P with | ((fpj, <pp\e') \ > (1/^) = 2 -r / 2 . 
This means we bound the time cost of our algorithm at k c for c an integer somewhat larger than 
8. We make a few comments: 

• It is easy to see that there are at least fi(£; 4 ) Hankel matrices of rank 1/k. If we really 
want to list-decode Hankel rather than Kerdock, the size of the output can really be at 
least approximately k 5 . Our runtime of k c will be approximately quadratic in the size of 
the output, which may be acceptable in some contexts. 3 

• A tighter analysis of the way the top and bottom halves of the matrix fit together may 
bound the number of rank-(l//c) Hankels more tightly than k s . 

• We have begun to investigate an alternative algorithm that exploits the fact that the restric- 
tion of a Kerdock codeword to a svhfield is a smaller instance of a Kerdock codeword. This 
algorithm is much faster as a list-decoding algorithm for Kerdock only, since it doesn't keep 
so many candidates. But the paradigm of bit-by-bit extensions in the algorithm of KM91 
and Algorithm 1281 does not work for subfields. 

Faster algorithms to list-decode Kerdock codes will be the subject of future work. 

Other future work will include extensions to the Delsarte-Goethals hierarchy of codes between 
RM(1) and RM(2). As one ascends the hierarchy, the size of the code increases as the the maximum 
dot product increases. 
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3 The « fc 5 output Hankel codewords come in (possibly overlapping) clusters of approximately k 4 vectors each, so 
there are at most approximately k clusters. One might hope to produce a compressed representation of the output 
in less time than it takes to write out the output uncompressed. Note, however, that the boundaries of the clusters 
are generally not smooth, so it will not suffice to output the cluster centers. 
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